Forum:Standardising page names for individuals/archived
Introduction This discussion started a couple of years ago. Now that we have over 20,000 articles and are gaining several new active contributors every month, I think we should finalise it and save time and probably attract and keep more contributors. First I will collect relevant extracts from other pages. Robin Patterson 13:58, 3 July 2008 (UTC) Genealogy talk:Page names Oct-Dec 2007 (Extracts of parts that were not later fully superseded; changing heading levels to fit here, and editing very slightly.) Finding us on Google Your response has included the somewhat confirmatory "a lot depends on how we name things" but has not offered your solution, your "how we should name things". Please spell it out. ... Robin Patterson 14:53, 11 October 2007 (UTC) :..... Shall we be content to wait until that far future when folks learn how to use the advanced search interface features like allintitle? Are we going to expect that folks that are curious about genealogy and search for their ancestor that they already know their birth and death dates? You are kidding. ... : ... The standards are requiring data to be stored in the title- data like birthdate that could be wrong. EG: we have a redirect from an article that says William was born in a certain year which was wrong. Ok. Sure the redirect mechanism worked, but all the articles using that link says he was born in the wrong year. Editorially, a person is certainly required to go back and change them, but take a bet on how often folks will do that. Sure double redirects can be fixed, but how about when we have a hundred users online simultaneously, and they are all changing around stuff to conform to their evolving understanding of their ancestors. It will be a mess. ... :..... If you use search features that 90% of folks have never heard about, or rely on the expectation that people already know a lot about their ancestors, then you have narrowed the footprint of your potential market radically. ... :What are the possible steps? They are disagreeable for different respects because they don't look genealogical. Don't get me wrong. I like the full information a long name provides at a glance. :*Never place anything between first name and last name. Absolute ban on that. Middle names don't have to be banned, just not between the first and last. :*Use WP KISS standards for naming. Keep it simple. If Fred I of England is good enough for WP, ours shouldn't be Fred I, Earl of Smith's City, 6th Viscount of Wherehouse, .... If the root name doesn't match WP's, then a good case should be made why not. :*There is a need to disambiguate names, so I can see putting at least one number in there. Much better if it were a number like Familysearch's approach using AFNs. But I can go along with a single date. So on that score, we could include :**birth date only :**Last 4 digits of guid. :*It's probably not a big deal if it is either of these or some other single number or equivalent. I can probably go along with any alternative. Currently I use birth only. :[[User:Phlox|'~'' Phlox']] 17:25, 11 October 2007 (UTC) :::(Copied from User talk page to here:) ::What about for people whose death dates are known and their birth dates are unknown? -AMK152(Talk • 21:29, 10 October 2007 (UTC) :::It's not a special case. Birth is always (befDEATHDATE), so you can specify in this instance as well. When both are unknown, Foo Bar (?-?) disambiguates no better than Foo Bar (?). In any case, at best your point makes guid more attractive, .... [[User:Phlox|'~'' Phlox']] 17:36, 14 October 2007 (UTC) Phlox proposal #Keep statement that these are guidelines, not blanket rules. #Recommend that users NOT insert middle name, initial (or anything else) between first and last name. Putting middle name elsewhere is ok, if desired. #:Lower priority recommendation: #Omit death date unless birth and death date are certainly known. Otherwise, only use birth date. ::::[[User:Phlox|'~'' Phlox']] 17:27, 20 October 2007 (UTC) Extensions and consequences I presume we would lift Google rank even further by omitting birth date too (thus in most cases matching the WP name if any). That would multiply by a few hundred the likely number of disambiguation pages necessary; not a problem, a difference of degree not of kind, because the omission of the death date already multiplies the likely number by several dozen. How about it? Robin Patterson 13:03, 1 December 2007 (UTC) If the disambiguation pages contain merely links to the pages for individuals, will that affect Google ranking? If so, I expect we could make a practice of adding detail ... to each link. ... Robin Patterson 13:03, 1 December 2007 (UTC) :I agree with omitting the dates, like the Wikipedia name. However, some people are known for their middle name (like John F. Kennedy or Warren G. Harding) and a lot of my Dutch and German ancestors are known to researchers by their full name. Like Gerrit Hendrik te Kolste (1794-?). Calling him Gerrit te Kolste would be confusing. Anyway, I agree with omitting the dates to get something like William I, King of England instead of William I, King of England (1027-1087) .... Of course, doing this, we would eventually develop many disambiguation pages. We are at a point in the Wikia where there are only a handful of contributors. As the number of contributors grows, so will the number of people who understand the use of disambiguation. So, people will be able to find us quicker and those who stay will gain knowledge in the use of disambiguation, and use them if necessary. -AMK152(Talk • 13:16, 1 December 2007 (UTC) ..... ::Thank you for taking time to set out a possible process. But I think we can make it shorter than that. ::#Create disambiguation page designed as the target of simple googling: John Smith. It has Template:Disambig and it lists all the John Smiths in the wiki by whatever complex page names they currently have, eg John Isaac Smith III (1902-1985). ::#Every John Smith (including every John Isaac Smith etc) in the wiki can be given a full page name (in the style recommended) as soon as he is created, with a link to him on John Smith. No need for quite as many pagename changes as listed above: ideally none. ::So a person finds our John Smith page very near the top of the hit list on Google, possibly with a context extract showing that there are several people on our page, so they come to us and can read the list then go off to find one that seems to match their one. Not unlike a RootsWeb search, .... ::Robin Patterson 02:27, 2 December 2007 (UTC) I think I mentioned disambiguation pages were one reason I was producing for each page an info subpage. When I get all pages with info pages we can do the DPL disambiguation pages: EG: *Barack Obama (disambiguation) *George Bush (disambiguation) Note that I made the second table on these pages sortable. EG: Click on the little arrow button in the Birth Place column, and all the John Smiths would be clustered into the location the person is interested in. Clearly for this reason, I would want to display these locations in reverse Country-State-County-city order so they would cluster in the most useful way for users. [[User:Phlox|'~'' Phlox']] 04:24, 2 December 2007 (UTC) :Surely a page called Barack Obama (disambiguation) will rank lower in a Google search than Barack Obama? Why complicate things? Can't we use the simple system proposed above, ... which matches Wikipedia's prime method of disambiguation? Or are you saying we will have both? I know that this wiki has several pages with names ending in "(disambiguation)", but most of them are that way because their creator had not properly studied the way WP makes that sort of thing as simple as possible with only a few defined exceptions. Robin Patterson 05:35, 2 December 2007 (UTC) ::Fine with me. I like simple .... What matters is how many inbound links there are to the article. I think I posted a pointer to a good discussion of google's algorithm, but it takes into consideration a lot of other things. My earlier point with google only had to do with not inserting extraneous terms between first and last name, so that our pages would be eligible for the most common kinds of searches. [[User:Phlox|'~'' Phlox']] 10:23, 2 December 2007 (UTC) Google rank of pages - adding anything to plain name? :(early April 2009 import of significant discussion from Phlox's talk page) "Page name" is one of the things we should settle before wholesale importation from GEDCOMs, I agree. I read your comment about it not mattering much whether a page name contains " (disambiguation)". Doesn't the addition of anything to the plain name reduce the "percentage" part of what determines ranking? Wasn't that part of your reason for the firm decision to depart from our standard of including death dates (which decision has not been adopted by other major contributors)? Robin Patterson 00:50, 6 December 2007 (UTC) :Google was only part of it. The part having to do with Google is the bit about including middle name between the first and last name. That is really dumb from a ranking perspective, because the weight of a term is determined by how closely it is adjacent to the other search terms. Secondly, it is important for phrase search because "Elvis Presley" will get zero hits on our site because article Elvis Aaron Presley (1935-1977) makes no mention of an "Elvis Presley". Even if it did, our rankings would be submerged because our site felt the term was not germaine enough to the subject to include in the Title. Really, really self defeating... :Anyway- you have a point that the more terms you shove in a title, the more you water it down. I was stating that adding "disambiguation" to the end is not fatal like adding "Aaron" to the middle is. So there is a google factor there, sure. :The main problem I had with putting data like dates in there is that it encourages churn in the database. Everytime someone comes in and has a different date in their genealogy files, or think that the date is not certain (should be c1856, not 1856)- they want to move the article. But our articles are aggregates- All will have at least the article page as well as an info page. Many will have separate ancestors, pictures and tree pages as KBorland is doing. Will the contributor move all pages properly? Maybe sometimes, but probably it will get screwed up frequently. So it presents a collosal maintenance pain for what- so that we can follow a conventions designed for paper filing methods? Discovering the death date is just a click away. And the cost of saving people the time to click? Massive maintenance burden. But let's get realistic about that. Given millions of articles, it means this work simply will not get done- which means what? That's right- Our site starts to turn to junk, with numerous broken articles. :Certainly, that problem doesn't happen at the 10,000 article number, but hey- Are we planning for a 1st tier genealogy or not? Sure we could fail for any number of reasons- but why plan for failure? Plan like we are really going to pull this off. We are not going to have one million, or just two million articles. The gedcoms on file already exceed those numbers. And what happens when this goes global and snowballs? Both of us may be old guys, but within our lifetimes we are easily going to see hundreds of millions of articles on this site Robin. That is an enormous enormous maintenance burden for those that come after us and we have to be hard nosed about what we do to preempt massive maintenance burdens that don't deliver significant pay offs for users. :I do not make the proposal lightly but we have to look at the rationale for the conventions and challenge whether they are relevant to our problem domain. Should we use all caps in the Surname? Well a decade ago, suggesting anything different was rebellious/naive idea. :I don't see why we should be a slave to convention, especially if it delivers insignificant benefits and will have such serious costs in the future. ::[[User:Phlox|'~'' Phlox']] 02:20, 6 December 2007 (UTC) ::OK, death years are nice (and very little detriment googlewise) if nobody's going to change them but we can't guarantee that, so chuck them (and allow 50 times as many disambig pages because that's a minor irritation). Same with birth years then? They are even more susceptible to change, and a name with just one date is unconventionally odd and therefore likely to puzzle newcomers and even turn them away. Shall we then agree that an individual's page be the plain name (same as on his or her disambig Google-target page) plus a fixed distinguisher such as a thingy-number (which you mentioned a while back but I haven't time to look up) - I could be happy with that. Robin Patterson 13:23, 6 December 2007 (UTC) :Wow. Birth years too? I knew you were closet radical. Well, I suppose it is not totally radical. After all, LDS does this in their ancestry files. In ancestry files, they use NAME (AFN). :The thingee- the generic term for what an AFN or a genealogics Person ID is "UID" (unique identifier). Some genealogy sites use the term UID and export them as part of their gedcom data. Naive users are going to create ugly links though- eg David Henderson (729382). I suppose I could make an info template that looks up the data on the fly. EG: looks up birth and death and displays that as if the person had gone to the bother of typing the wikitext David Henderson (c1734-1810) So what are the Cons? *Will disambig pages really only be a minor irritation? I have seen a surprising number of multiples even for uncommon names. :::There will be thousands of them but very easy to handle. Robin Patterson 03:57, 9 December 2007 (UTC) *Audience reaction? Will folks have a Frankenstein reaction when they see David Henderson (729382) as the title of the article on their beloved ancestor? Will it really be William I of England (390351)? *Learning curve/ barrier to contributing? Ok- We make it simple to generate UIDs-let them make them up- Any 6 digit number. Otherwise, if we have something more complicated like a computer generated UID as I originally proposed, then we have to wait until I figure out how to make a widget do a form that will input the data and create the UID on an info page for them. *The way DLP works, I am not sure that I will be able to generate a friendly name. I think it may have to have the UID because it wants to use the real title of the article. Of course, we can put a column in there for birth and death years, so not that big a deal if that turns out to be unavoidable. *Will long numbers in a title downgrade a google hit? Possible rationale- the page is more likely to be a technical or database like dump page. They might do this- it is said they examine 60 or more factors. No way to know for sure. I suppose we could make it 3- there might be collisions, but they will know when they create a page. If we make it 4, people might assume it is a date. When a 3 range starts to get exhausted, we could tell people to use 6 digit numbers. *Sing along with me: "Secret Agent man... We're giving you a number, and taking away your name". Numeric identifiers make ancestors more impersonal. Date of birth is less impersonal, but that is not without problems- eg John Smith (1956) Worth it? I don't know. But it's the impersonal factor- how big a Con it is, that's a subjective judgement call. :Any more cons? [[User:Phlox|'~'' Phlox']] 17:22, 6 December 2007 (UTC) ::Since the naming disambiguities are only going to occur, by definition, in the same surname, why not have people who own or regularly contribute to each surname decide on a case-by-case basis how to name their individuals? For example, the Dzyban family (a small family of an ethnic minority in the hills of Poland) will probably have different needs in page naming than the Smith families of the U.S. Perhaps instead of numbers in the titles to disambiguate, there are other options as well, i.e. John Jones (1800-?) of Pittsburgh, son of Bill & Mary. Perhaps the creativity and flexibility allowed in the Wiki format is is best feature, after all. I think uniformity in page naming is far less important than uniformity in categorization, since the categories we're creating are going to become the real tools that set this site apart from others as far as search capabilities. For example, being able to generate automatic lists such as New York City births in 1608, etc. Kborland 01:38, 7 December 2007 (UTC) :::Certainly, these will be guidelines not policy and folks may elect to depart from them. Robin is aware that I shall be importing a very large number of articles into Familypedia in the foreseeable future. For now, that is the scope of articles that will be affected. But due to the number of articles involved, it will become the de facto standard. So although it could be altered later via bot, it would be better to have the discussion now rather than after the import. :::Anyway- to your comments- you are right that predictability and uniformity in cats is important. But your particular example is mistaken. A hard coded category scheme as you seem to suggest won't scale. Fortunately, we don't need to hardcode categories like "New York City births in 1608". The database functionality you envision will use real database queries using DLP. The database features depend on stability in names of objects, and that is a vulnerability we currently have. This along will google searching is the driving motivation of the naming convention discussion. Do you see how Template:Info categories can extract information from an info page and generate a category? Well- when there are enough "born in New York city" folks on the wiki, that template will generate a marker along with a bunch of other markers for fields you can query like mother's maiden name, death city etc etc. Buckets of them to query on. If you attempted to hardcode them as categories, you run into a combinatorial explosion. But you don't have to because DLP can generate a list with a query like: "get me articles with death city NYC, surname Smith, death decade 1890". I've done demo examples for Catherine Price, Bush and Obama. It can do this because all the information is in an easily locatable form so that it can be extracted easily. But what happens if the person renames the article and forgets to move the info page? Notice how many times you have to rename due to changing information about the subject? Why are people doing that? Because they are putting data that is subject to change in the name. This makes the wiki by definition an unstable database. Stability in naming means that info pages don't get separated from their articles. :::Whether or not you agree this sort of solution is necessary, you are free to opt out. The info templates don't assume that there is some special number in the name that it needs in order to function. It will make no such assumptions, so folks can do "Bob Jones (c1863-??) of Pittsburgh, son of Bill & Mary", and change it to "Bob Jones (c1863-1922) of Pittsburgh, son of Bill & Mary" when they learn the death date, and rename it to "Bob Jones (c1863-1922) of Pittsburgh, son of Bill & Catherine", when they learn Bob was actually the son from an earlier marriage etc etc. No problem. Just so long as you move the Info page along with it, you will get all the benefits of dynamic query and info pages. Bit of a hassle though- and most folks aren't going to do it. :::And really- you don't have to bother with the info page either if you don't want. No one is going to force anyone to do anything they don't want. All I'm saying is that the info page features aren't going to work without them. You may not care about them with only 10,000 person articles. I predict you will care with millions. ::: [[User:Phlox|'~'' Phlox']] 02:43, 7 December 2007 (UTC) :I have been doing reality checks in my head, you know- creating mental mockups of what the site would look like with these funny numbers at the end, and it seems to me they may be too disorienting, because there is no context of time. It forces the display name be friendly- so the real name with the funny number never shows up in the article. That is yet another bit of learning curve to add to our contributors- Like we have to tell them- forget what you know about article name linking. Instead always use , otherwise you won't get the friendly name displayed. But even for experienced users, it would be disorienting because look at what it is like in cases when you are trying to sort which person of a similar name is the correct father. You are discussing these two Joe Unknowns, and the article display says Joe Unknown (1734-1776) but says only versus on the edit page. It's a PITA. :So I am not sure my proposal won't create more problems than it solves. Maybe we just have to keep enough Bot operators trained to make periodic passes after people inevitably leave messes after renaming articles. [[User:Phlox|'~'' Phlox']] 23:54, 8 December 2007 (UTC) ::Instead of a meaningless-looking number, how about a relatively change-proof decade? George Bush (1930s) would be a fairly easy thing to teach people, obviously meaning something to them and avoiding "c" and "bef" and "aft" (mostly) and small year corrections almost entirely. Easy for bot to put them in their "Born in the ....s" category. Until we get another George Bush born in that decade we have two distinct pages linked from our Googletarget George Bush with no doubt in most searchers' minds about which one they want (and no need to disguise the true pagename with a ); when we do get another, we (ie any ordinary user) can create more distinctive pages with less need for rules to specify the precise form. Robin Patterson 03:57, 9 December 2007 (UTC) :::Yeah. Small point though- the bot won't treat anything in the article name as data for reasons I was intimating to Kevin. It keeps it clean. But your proposal deals with most of the problems. It is a good 80% solution for name moves, since 20% will be on the decade cusp (1849 vs. 1850 birth), with the remainder (some of my "befores" could be as much as 20 years before my guess- Some are based on socio-biological minimums for likely motherhood age. Anyway, they still are going to call us weird, but I think we are getting closer Robin. Thanks- it was a good proposal. I will do some more mental modeling and see if it holds up. :::By the way, I was a dunderhead for not thinking about this, but I am going to have to crank down the knob on my activities here for the holiday season. I've got 4 kids 2-6 years old and I shouldn't be undertaking anything major like info page switchover until after the 1st. So I think I will be mostly puttering around here and will hold off on any major bot runs. So happy holidays and pass the eggnog, heavy on the nutmeg. :::[[User:Phlox|'~'' Phlox']] 04:20, 9 December 2007 (UTC) :Robin- Two things on your "1930s" proposal: :#If there are two George Bushs in the 1930s, we have a disambiguation page. What article names does the disambiguation page point to? Are you proposing some variation of the funny number then? eg George Bush (395) (1930s)? :#Bill made a good point in the context of a gedcom discussion about many people's genealogy research being based on tertiary sources- simply copying a relationship because some Gedcom file said their was a parent child link. So for bulk GEDCOM import, having a vague period in a title like (1860s) is completely appropriate indication to people of how they ought to regard articles that contain no primary source evidence. :If that convention gained traction, then perhaps it would be built on. For example, renames to something specific might be done if the confidence in the information was elevated. For example, if it turned out that the imported material is verified by a contributor as having sufficiently reliable source material that supports the the specifics contained in the article. [[User:Phlox|'~'' Phlox']] 19:30, 10 December 2007 (UTC) Fairly recent comments by a helpful newcomer (Copied from material written by Vick jay 17:00, 2 July 2008 (UTC) on Forum talk:Organization. Slightly off topic but relevant to the problems that may appear with the proposed "final" page names (the targets of the disambiguation pages) and also relevant to the incentives to persuade most or all contributors to follow the guidelines.) In the last month, I have "categorized" many, many people. There are thousands yet to do. It's ridiculous. That tells me others are uploading their family information without a clue as to what to do next. Instructions need to be more clear on how to "categorize" and connect family members AND why you must do it. ... The current method is not sensible simply because, as it is, you, AMT and others like me must slog through innumerable entries and "fix" them. What a waste of time! As time goes by and you have more people adding their data, how on earth are you going to handle the same family members with slightly different information? ie., Davis Stockton b. c1680, Ireland d. 1761 Virginia and Davis Stockton b. about 1688 County Meath, Ireland, d. 2 Jan 1762 Charlotteville, Albemarle, VA. As a Stockton researcher, I know they are the same person, but would you? Is someone here going to "edit" the data and merge the 2 Davis Stockton? What criteria will be used? OR will you leave 2 entrys for the same person? :(Response from AMK:) Those researching the person will discuss it. If they're the same exact person, they should have the same exact data. If unknown data is estimated, the sources can be reviewed by both parties, in order to determine the most precise data. :-AMK152(talk • ) 03:36, 3 July 2008 (UTC) (I think I've got the lot there! Now for a bit of reference to subsequent page-creations.) Robin Patterson 13:58, 3 July 2008 (UTC) More recent developments Simple Google-friendly disambiguation pages After some of the above discussions, the Obama page was redirected (as recommended) to the simpler form Barack Obama. It looks good to me (and the amount of detail completely wipes my concern that a disambiguation page might not rank highly on Google). It is one of a dozen pages currently listed in Category:Similar person names. Eventually there could be millions; have a look at it while it's still small, in case there's a fatal flaw that needs fixing! The pages that end with "(disambiguation)" can probably go the same way as Obama, because the couple I looked at are not following the Wikipedia simple method. I'm happy to discuss that with anyone affected who's puzzled. Maybe read Wikipedia:Wikipedia:Disambiguation first. The corresponding template is closely based on a Wikipedia one with a sort of acronym for its name; we can copy the short one as a redirect to save typing time. Robin Patterson 16:21, 3 July 2008 (UTC) :I agree with the above. I think someone coming to our site looking for information on their ancestors should be able to type in a first and last name in the search box and find either an article (if that is the only person with that first/last name combination) or a disambiguation page as illustrated above listing all the individuals with that first and last name. I think (although I don't fully understand the discussion on Google) that this would work for Google too.Bill H 15:41, 12 April 2009 (UTC) How much care do individuals' page names need? Based on comments by Phlox and Vick jay, among others, I recommend that we keep newly created pages for individuals fairly simple and include: * no death years unless absolutely certain * birth years only when we are certain about them or when we know there's another person whose page name would otherwise be the same — and even then there may be a better distinction method, like those that Wikipedia uses (and I think that Phlox and AMK and I agree that the use of the Wikipedia name exactly is very often the best idea, for more than one reason) Many new entries for past centuries will therefore be Google-friendly pages in their own right, with just the first and last name. If another individual of that name appears, the person wanting to create it should: # add some disambiguating feature (such as approximate birth year, or a Wikipedia-style word/phrase, in parentheses) #"Move" the original to a name that disambiguates it sufficiently too #edit the original page (which will have become a redirect) so that it has Template:Similar person names and links to both. "Fixing" of existing pages is very low in my priorities. I know one of our biggest contributors has been doing sterling work with things like removing the space between "c" and a date; but I don't think it's really important at that level of detail, because we could easily get another contributor (either by carelessly not searching for near-duplicates, or by using a GEDCOM program that can't search for near-duplicates) putting up a page for that individual with a different (possibly dead-accurate) birth year; the presence or absence of a space in the existing one will be quite irrelevant then. Such near-duplicates will be found, in most cases, if at all, fairly easily on a search hit-list or just a look through a category. (Another potential use of birth decade categories or even century categories, Vick jay!) Robin Patterson 16:21, 3 July 2008 (UTC) More? (Long past my bedtime. See you all!) Robin Patterson 16:21, 3 July 2008 (UTC) (4.21 am in NZ) AMK152's comments I went to google and did some searching: I seached "Mary Buskirk," my great great grandmother: Mary Susannah Buskirk (1880-aft1930) appears number 4 on the list. The 3 results are Mary Buskirk's, not her. I searched a more common name, "Thomas Putnam," looking for my ancestor. His son, Thomas Putnam was notable enough to have his own Wikipedia article. Yet, my ancestor, Thomas Putnam (1615-1686) appears at number 7. Perhaps a very notable person? I searched "George Washington" and he doesn't appear on the first page. This is simply because he is a very notable person, and appears in many locations on the internet. He appears #7 on the list when searching "Genealogy of George Washington." So we are getting good google rank. The disambiguation pages without the "(disambiguation)" part I agree with, and have moved some pages accordingly. This is what I prefer: *First Middle Last (YOB-YOD) *First names, of course we need. *Middle names can be optional, but I would agree with just keeping it to their common middle name. We don't really have to place their entire middle name, like Waren G. Harding's. *Surnames, of course we need. *Birth year and death year help differentiate beween people with the same name. As do middle names. If a person is living only birth year should be shown. Not John Smith (1950-?) or John Smith (1950-) or John Smith (1950-Living). Just John Smith (1950). -AMK152(talk • ) 21:33, 8 July 2008 (UTC)